-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to skip password when serializing filesystem #1625
Conversation
I should start by saying that including passwords directly in code/scripts is not generally a good idea. Many or the backends provide alternative ways to get credentials at runtime, from certain files, environment variables or other means. Am I right in thinking that you are simply looking for dictionary keys that happen to be called "password"? That would not include other possibly sensitive values like keys and tokens, but I don't suppose there's any way to know whether some value is sensitive or not. I am mildly against removing any values by default, as it would break the principal function of dict/json serialization, to faithfully recreate the original instance. How would a user re-apply the password when they need it, or how would they know in the first place that it was missing? However, it might be suitable to provide a "safe ser" mode, but perhaps each backend implementation would declare which instance kwargs are to be considered sensitive? |
Yes, it's that straightforward.
This would be a more comprehensive way to filter out sensitive data, but would require more time to implement. I think we should release a simple/partial patch first.
It should not take much digging into an authentication error to realize that the password is missing. Do many people expect that serialization includes sensitive data by default? The requirement of passing |
Let's make removing "password" opt-in for now and rettain the same default behaviour for now.
Unfortunately, you commonly get a "permission error" or something more arcane without further details. Most security systems will not tell you specifically that a password is missing. Of course, the FS repr doesn't include whether a password is set or not. I don't know, but you may even get an attribute-error for some backends if you simply don't set the value (should it be None by default, or ""?).
It depends! If the purpose is to pass instances within a secured network, then yes. Pickle may well do that in a worker cluster, after all. Where instances need to be (re)created from a set of arguments plus additional information from the user or environment, you may well need a helper like Intake to encode this intent. |
Alright, I have updated the PR accordingly. |
Currently,
AbstractFileSystem.to_dict
(and by extensionAbstractFileSystem.to_json
) store all arguments used to construct the object. However, there are some cases where the user may provide a password, such as inFTPFileSystem
,SMBFileSystem
, andWebHDFS
. This may cause security issues, especially considering that the password is stored in plain text.This PR disables the aforementioned behaviour by default. Users now have to passAfter some discussion, this feature is now opt-in. I have added a warning to the documentation to remind users to be careful about this.include_password=True
to the relevant methods if they wish to store the password anyway.